From Flat Lists to Taxonomies: Bottom-up Concept Scheme Generation in Linked Statistical Data

نویسندگان

  • Albert Meroño-Peñuela
  • Ashkan Ashkpour
  • Christophe Guéret
چکیده

RDF Data Cube allows the modeling and publishing of Linked Statistical Data (LSD) in the Semantic Web. Often, variable values of such statistical data come in a non-standardized way and represented by too narrow, concrete or wrongly typed literals. Generally, adequate and standard concept schemes for such variables (especially in very specific domains like historical religious denominations, or building types in the pre-industrial era) do not exist and need to be created. This is a manual task that requires lots of expert knowledge and time investment. We present a workflow that combines hierarchical clustering and semantic tagging to automatically build concept schemes in a data-driven and bottom-up way, leveraging lexical and semantic properties of the non-standard dimension values. We apply our workflow in two different use-cases and discuss its usefulness, limitations and possible improvements.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Approach to Trace Time-Domain Trajectories of Power Systems in Multiple Time Scales Based Flatness

This paper works on the concept of flatness and its practical application for the design of an optimal transient controller in a synchronous machine. The feedback linearization scheme of interest requires the generation of a flat output from which the feedback control law can easily be designed. Thus the computation of the flat output for reduced order model of the synchronous machine with simp...

متن کامل

Hierarchical Boosting for Gene Function Prediction

Functional classification of genes using diverse bio-molecular data obtained from high-throughput technologies is a fundamental problem in bioinformatics and functional genomics. Genes are organized and classified according to a hierarchical classification scheme and each gene will participate in multiple activities. Flat classifiers, that work on non-hierarchical classification problems indepe...

متن کامل

Clustering Concept Hierarchies from Text

Abstract We present a novel approach to learning taxonomies or concept hierarchies from text. The approach is based on Formal Concept Analysis, a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. Our approach is based on the distributional hypothesis, i.e. that nouns or terms are similar to the extent to which they share contexts. F...

متن کامل

The construction and exploration of attribute-value taxonomies in data mining

With the widespread computerization in science, business, and government, the efficient and effective discovery of interesting information and knowledge from large databases becomes essential. Knowledge Discovery in Databases (KDD) or Data Mining plays a key role in data analysis and has been found to be beneficial in many fields. Much previous research and many applications have focused on the...

متن کامل

Structure- and Extension-Informed Taxonomy Alignment

Ontologies and concept taxonomies help software systems organize data more effectively for particular application domains. Ontologies also enable sharing and integration of data from different domains and data sources. However, ontologies from different domains are rarely identical; thus, there is need for techniques to find alignments between concepts in different ontologies and taxonomies. In...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014